Open In Colab

NLP Analysis & Sentiment Classification on 568K+ Amazon Fine Food Reviews

Dataset:

Amazon Fine Food Reviews dataset is downloaded from Kaggle and placed in my google drive for programmatic retrieval. It consists of reviews of fine foods from amazon. The data span a period of more than 10 years, including all ~568,000. Reviews include product and user information, ratings, and a plain text review. It also includes reviews from all other Amazon categories. You can read more about the data at SNAP.

Approach:

Many a time in business, as well as in academia, we don't have the luxury of proper data pipelines to decide the scope and the value associated with data beforehand, and often we derive value out of already collected, sometimes good but often messy, data. That was the case for this analysis, and that is why you'd find the Goal of the project in section 3.

An analysis is a sequential process, which involves a lot of questions back and forth. I've tried to capture this natural question and answer process by appropriately documenting the question and the answer to the question either through text or through python commands.

I think this approach at least serves two purposes first it gives me an option to analyze my thinking, to see what kind of question I am asking and to track my approach to a problem. Secondly, some times when we read someone else's analysis we find it too hard to decipher some of their actions, this is an intuitive way where the question precedes an action, thereby self-explaining the action at each step. I've tried to keep the questions as natural as possible.

Sometimes 4 or 5 questions into some section of the analysis you may feel I could've skipped questions 1, 2 & 3 and reached straight to questions 4 & 5 but when we do analysis it seldom happens that you reach straight to the goal, it is often preceded by some missteps or extra steps before where we eventually reach where we reach, and I am attempting to document those missteps and extra steps.

I would love for you to read the analysis in its entirety, but it is possible that you may feel that the analysis is too lengthy and that you may not have enough time to go through the entire analysis. If that's case I would strongly suggest you atleast read the summary part( section 9.1).

Key Results:

  1. Model finalized was a RNN model (LSTM Cells).
  2. Model had an initial train and validation accuracy of 90%.
  3. Model had Sensitivity of 91.44%, Specificity of 93.56% and precision equal to 98.06%.
  4. Upon detailed analysis of the errors, model accuracy and learning was found to be higher than the earlier assessment.
  5. Model is close to the desired Low Bias Low Variance sub-section.
  6. Learned weights can be successfully used to learn sentiments for similar tasks in different problem areas through transfer learning.
  7. RNN Architecture (including the LSTM cell) is extremely powerful at dealing with sequence learning problems, especially more so when it comes to Natural Language Processing.

1. Initial Setup

1.1 Libraries

In [0]:
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
import re
from gensim.models import Word2Vec, KeyedVectors
import tensorflow as tf
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization, LSTM, Embedding
from tensorflow.keras.callbacks import TensorBoard
import time
from random import sample
from wordcloud import WordCloud, STOPWORDS
import seaborn as sns

1.2 Data

In [0]:
# Connecting to Google Drive
from google.colab import drive
drive.mount('/content/drive')
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive
In [0]:
df = pd.read_csv("drive/My Drive/colab_files/Amazon_Reviews/Reviews.csv")
df.head()
Out[0]:
Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time Summary Text
0 1 B001E4KFG0 A3SGXH7AUHU8GW delmartian 1 1 5 1303862400 Good Quality Dog Food I have bought several of the Vitality canned d...
1 2 B00813GRG4 A1D87F6ZCVE5NK dll pa 0 0 1 1346976000 Not as Advertised Product arrived labeled as Jumbo Salted Peanut...
2 3 B000LQOCH0 ABXLMWJIXXAIN Natalia Corres "Natalia Corres" 1 1 4 1219017600 "Delight" says it all This is a confection that has been around a fe...
3 4 B000UA0QIQ A395BORC6FGVXV Karl 3 3 2 1307923200 Cough Medicine If you are looking for the secret ingredient i...
4 5 B006K2ZZ7K A1UQRSCLF8GW1T Michael D. Bigham "M. Wassir" 0 0 5 1350777600 Great taffy Great taffy at a great price. There was a wid...

2. EDA

2.1 Question: How many unique values ?

In [0]:
df.nunique()
Out[0]:
Id                        568454
ProductId                  74258
UserId                    256059
ProfileName               218416
HelpfulnessNumerator         231
HelpfulnessDenominator       234
Score                          5
Time                        3168
Summary                   295742
Text                      393579
dtype: int64

2.2 Question: Are there any null values ?

In [0]:
df.isna().sum()
Out[0]:
Id                         0
ProductId                  0
UserId                     0
ProfileName               16
HelpfulnessNumerator       0
HelpfulnessDenominator     0
Score                      0
Time                       0
Summary                   27
Text                       0
dtype: int64

2.3 Question: How the scores are distributed ?

From the below plots the dataset is heavily skewed towards positive rating, this has to be kept in mind before any assertions are drawn.

In [0]:
def GetRelativeFrequency(df):
  series = df.Score.value_counts() / len(df)
  indx = np.sort(df.Score.unique())
  plt.bar(indx, series*100)
  plt.ylabel('%')
  plt.xlabel('Ratings')
  plt.title('Relative fequency')
  plt.xticks(indx, series.index)
  for i, v in enumerate(series*100):
      plt.text(indx[i] - 0.25, v + 0.01, str(round(v,2)))
GetRelativeFrequency(df)

3. What is the Goal of this project?

Given that the dataset has very limited features the scope of the analysis is severely restricted. More details about the user (like the users demographic) would have opened doors for all kinds of analysis, but still the textual data present can be used to develop a sentiment detector or classifier.

I'd like to build a classifier which uses the whole review to classify the sentiments broadly into 2 categories Positive(4,5) and Negative(1,2,3), afterward ideally I'd like to test this upon food reviews in twitter where there're no ratings accompanying text, but let's condition this upon the availability of data. Further, I also, if time permits, would like to add more granularity to classification task but I believe this will reduce the practical implementation as it will needlessly shrink the generality of the classifier thereby messing up the practicality aspect.

Further, I'd like to analyze in general what are the type of phrases or words that make a review positive or negative and check if this reconciles with our understanding of language.

4. Data Preparation

Question 4.1: is the data balanced? if not do you want to artificially balance?

No, the data has severe imbalance towards the positive review. For ideal results and for the ability to give the model a honest chance to learn things, input classes should be appropriately weighted, otherwise the model would find it easy and beneficial for itself to classify all things postive.

Question 4.2: What approach do I want to take balance data ?

For now lets proceed with weighted loss function where I'll weigh the loss function such that classifier classfies both the sentiments with equal effect.

Question 4.3: Does the data needs to be prepared or preprocessed before it is fed to the model ?

Yes.

In [0]:
#Cleaning Text column by lowering all the text and removing all html tags and punctuations
def CleanText(series):
    series = series.str.lower()
    #Removing html tags
    series = series.apply(lambda x: re.compile(r'<[^>]+>').sub('', str(x)))
    #Removing Punctuations
    series = series.str.replace('[^a-zA-Z ]', '')
    return series

df.Text = CleanText(df.Text)

Question 4.3.1: Are the texts cleaned correctly ?

In [0]:
df.Text[55555]
Out[0]:
'what do you get when you combing cheerios with honey smacks and peanuts you get this  and it actually sounds good unfortunately the combination of the flavors is just okay not great and im not sure it is even good there is a weird cloying aftertaste and the cereal feels too dry the individual pieces too large and too crunchy it just wasnt for me the kidlets think it is okay but prefer honey nut cheerios or even raisin bran my fave to this  and thats not a good signbottom line ymmv but i wont be replacing this box with another one'

Question 4.3.2: Why all words from all of the reviews are put into a list ?

Because the Gensim Word2Vec function takes words as a list.

In [0]:
#Creating Vocabulary & Word Embeddings using 
words = [i.split() for i in df.Text.values]

Question 4.3.3: Why do we need word embeddings?

Word embeddings are word vectors with n dimensions, i.e. mapppings of all uniuqe words in a n-dimensional space. Using this mapping we can find similarity and dissimilarity between the words across the dimensions. Word embedding adds the much required context to each words, therfore immensely useful for Natural Language Processing tasks. Without word embeddings all the vectors are one hot encoded vectors thus containing no information about relation between words.

In [0]:
w2v = Word2Vec(words, min_count =1, size=300)

Question 4.3.4: What does the Word2Vec object look like ?

In [0]:
",".join(w2v.wv.__dict__.keys())
Out[0]:
'vectors,vocab,vector_size,index2word,vectors_norm'

Question 4.3.5: Are the dimensions of the word embeddings as expected (i.e = 300) ?

In [0]:
w2v.wv.vectors.shape
Out[0]:
(307893, 300)

Question 4.3.6: Are the index & words mapped correctly to each other ?

In [0]:
print(w2v.wv.index2word[542])
print(w2v.wv.vocab['cannot'].index)
cannot
542

Question 4.3.7: What does a weight look like ?

In [0]:
print(w2v.wv.vectors[10,:].shape)
w2v.wv.vectors[10,:]
(300,)
Out[0]:
array([-0.18104742, -1.2497998 , -1.1319817 ,  1.8067105 ,  2.7872844 ,
       -2.010245  ,  0.3808805 ,  1.1534877 , -2.4066968 ,  0.32446784,
       -1.2265315 , -1.1333905 ,  2.5939713 ,  1.6596143 , -1.0352924 ,
        0.07724468,  1.8337735 ,  0.29021266, -0.8216739 ,  0.7200696 ,
       -1.0122299 , -0.11281136,  0.27038774,  0.6069221 , -1.6105305 ,
        0.8611002 , -0.52699697,  1.3249654 ,  1.3418592 ,  0.26424673,
        0.15923394,  2.30058   ,  0.12333897,  0.37867638, -2.004547  ,
        0.5796054 ,  0.02256189,  0.55653995,  0.28472346, -1.3323157 ,
        1.7913939 ,  0.01883489, -1.7407858 ,  0.6406356 , -0.9316717 ,
        0.3457021 , -0.3021748 , -0.2931823 , -0.64408696,  4.0293293 ,
       -1.0055621 , -1.5999483 , -0.42058903,  1.9321812 , -0.95228875,
        0.35857683, -0.46361202,  0.29202938, -1.225377  ,  1.2092105 ,
        1.8590527 ,  0.51404583, -1.8494363 ,  0.08440273,  1.2671006 ,
        0.37534112, -0.6422243 ,  0.41197422,  1.3806096 , -0.916566  ,
       -0.6066929 ,  0.8043858 , -0.41393176, -0.6373395 , -0.52019536,
        0.09994564,  0.38260245, -0.9210418 , -0.8930494 , -0.6610764 ,
        0.9461852 ,  1.0093952 , -1.164363  , -0.20406863, -1.6471367 ,
        0.5114375 , -0.7888277 ,  0.79338515, -0.5858847 ,  0.8784051 ,
        1.4670299 ,  1.5463855 ,  1.524068  ,  0.74477273,  2.6482408 ,
       -1.4991585 ,  1.6008915 , -0.93414557, -0.45095256, -0.8823175 ,
       -2.0718856 ,  2.2943592 , -1.5482563 ,  2.369163  , -0.52008134,
        0.76229143, -0.05222584,  1.2010634 , -0.92254305, -0.46274862,
        0.28051513,  1.2468987 ,  1.0041665 ,  0.24189016, -1.2344149 ,
        0.42661834, -0.10070921,  0.5782365 ,  0.95112324,  0.51980007,
       -0.42796633, -0.46124002, -0.9534698 ,  1.0186156 , -0.26475593,
       -1.3889225 ,  0.48424593,  1.8281032 , -1.1401953 , -1.537458  ,
       -0.2456198 , -0.26848346, -0.6636296 ,  0.05492309,  1.3376454 ,
       -0.4234075 , -1.6215054 ,  0.16531473, -0.12475172, -2.2263045 ,
       -0.406348  , -0.3560691 , -1.2267014 , -1.8441279 , -1.9210414 ,
       -2.7497017 ,  0.20614469, -0.3743033 ,  0.6482293 , -0.959609  ,
        0.87907165, -1.6210407 , -0.9624109 ,  0.6216253 ,  2.9224465 ,
        1.7364814 ,  0.5368722 ,  0.96438444,  0.48926356,  2.428751  ,
       -1.0192219 , -0.490602  ,  0.7312698 , -0.8849511 , -0.13022673,
        0.5049143 ,  0.74359745, -2.9420211 ,  1.5625987 ,  1.6567944 ,
        2.4864378 , -0.36053106,  0.9061109 , -1.81099   ,  0.64252627,
        0.20768294, -1.5368731 ,  3.215539  , -0.4026879 , -0.39301   ,
        1.1922219 , -0.39547405, -1.4036239 , -0.6661537 , -1.1376221 ,
        1.0213506 , -1.3915201 ,  0.4169806 ,  0.46241802, -0.21555145,
        0.784338  , -0.03467632,  1.1193306 , -1.3245391 , -1.2327467 ,
        0.9850823 ,  0.4533202 , -0.3240272 , -0.02953234, -0.18395852,
       -1.5352571 ,  0.33965102,  0.16528751,  1.5426238 , -2.102574  ,
       -0.6142356 , -0.963321  , -0.7218319 , -3.3107984 , -0.11682919,
        0.5524748 ,  1.3584698 , -2.3031864 , -0.35098213,  0.03981791,
       -1.3434101 ,  1.3432355 , -0.99546695, -0.19291009,  0.12467115,
       -0.6939354 , -0.15964845,  0.1039243 , -0.13582768, -0.32180354,
        1.173758  , -0.29954374, -1.1846739 ,  0.63810754,  0.5948176 ,
        0.5730314 ,  0.5675725 , -1.357686  , -1.7631574 ,  0.2913105 ,
        0.11838528, -0.37030005, -0.9421987 ,  3.3103516 ,  1.6551024 ,
       -1.0792018 , -1.275296  , -1.2532692 ,  1.8658856 , -0.26757142,
        0.32598072,  0.64035004,  2.082652  ,  0.64562964, -1.1560076 ,
       -1.7930465 ,  0.1261293 ,  1.9538591 ,  0.24014817, -0.62041926,
        0.04959441,  0.62699777, -0.7313769 , -0.6014762 , -0.35985568,
        1.0872704 , -0.00519367,  0.62855995, -1.8704492 , -0.5298076 ,
       -1.7779037 ,  2.8740919 ,  0.2648468 , -1.8656689 , -1.1798289 ,
       -0.18614814, -0.7772559 ,  1.808047  ,  0.58786756, -2.3228588 ,
        0.32460216,  0.9842996 , -0.92320204,  2.1951838 ,  0.622701  ,
        0.20515274, -0.0423336 ,  1.3473248 , -0.64104974, -0.10606533,
        0.5456224 , -0.5443206 , -1.2018715 , -0.2342521 ,  1.1587564 ,
        0.36773735,  0.98641545,  0.39894435, -2.360174  ,  0.9598206 ,
        0.5972056 , -1.4707253 ,  0.5950325 , -3.10964   ,  0.90043056],
      dtype=float32)

Question 4.3.8: What else needs to be done ?

Texts are required to be converted to index.

In [0]:
seq_len = 100 # mandatory sequence length
unique_words_len = w2v.wv.vectors.shape[0]

# Function to pad reviews with less than 100 words
def PadSequence(sequence):
  pad_by = seq_len - len(sequence)
  for i in range(pad_by):
    sequence.append([unique_words_len])
  return sequence

# Function to get Network feedible data
def GetTrainingData(df):
  X = []
  Y = []
  for row in df.values:
      Y.append(1 if row[-4] > 3 else 0)
      sequence = [[w2v.wv.vocab[x].index] for x in row[-1].split()[:seq_len]]
      X.append(PadSequence(sequence))
  return X,Y
In [0]:
# Get Data & labels
X,Y = GetTrainingData(df)
In [0]:
# Converting data and labels into feedible format
X = np.array(X).reshape(len(X), seq_len)
Y = np.array(Y)

Question 4.3.9: Why are the word embeddings manipulated in below cell ?

Word embeddings originally had rows = number of unique words and columns = embedding size. An extra column of zero has been added to account for the padded sequences.

In [0]:
vocab = w2v.wv.vocab
index2word = w2v.wv.index2word
vectors = w2v.wv.vectors
word_emb = np.zeros(( vectors.shape[0]+1,  vectors.shape[1]))
word_emb[:vectors.shape[0],:] = vectors
In [0]:
# Saving the data so that I don't have to run through above steps again
# when runtime( or enivronment) gets disconnected
np.save("drive/My Drive/colab_files/Amazon_Reviews/X.npy", X) # data
np.save("drive/My Drive/colab_files/Amazon_Reviews/Y.npy", Y) # labels
np.save("drive/My Drive/colab_files/Amazon_Reviews/emb.npy",w2v.wv.vectors) # word embeddings
In [0]:
# Loading the data from drive
X = np.load("drive/My Drive/colab_files/Amazon_Reviews/X.npy") # data
Y = np.load("drive/My Drive/colab_files/Amazon_Reviews/Y.npy") # labels
vectors = np.load("drive/My Drive/colab_files/Amazon_Reviews/emb.npy") # word embeddings

Question 4.3.10: What does the last column look like ?

In [0]:
word_emb[-1,:]
Out[0]:
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

Question 4.3.11: What does the second last column look like ?

In [0]:
word_emb[-2,:]
Out[0]:
array([ 3.46532231e-03,  2.76603680e-02,  1.60172787e-02, -1.85083468e-02,
       -1.15339560e-02,  1.42045263e-02, -8.34932434e-04, -1.54991010e-02,
        1.44582253e-03, -1.62088033e-02,  4.49324353e-03,  3.73439434e-05,
       -6.76051388e-03,  2.18155999e-02, -3.75771732e-03,  2.86660437e-02,
       -1.16999084e-02, -9.66009032e-03,  2.70798500e-03,  1.29472585e-02,
       -9.99964401e-03,  2.59627681e-02,  6.87613757e-03, -7.46838050e-03,
        3.28351930e-02,  1.98714919e-02,  2.54711462e-03, -7.97835470e-04,
        8.81644432e-03,  1.99013650e-02,  2.61793360e-02,  2.21661553e-02,
        2.25368906e-02,  2.37421715e-03, -6.21933548e-04, -1.79080339e-03,
       -9.54614673e-03,  7.28357816e-03,  9.57291294e-03,  2.30551083e-02,
        1.48102418e-02, -1.50850145e-02, -2.65265927e-02,  2.32690834e-02,
       -3.26300925e-03, -4.26180568e-03, -1.49887344e-02, -9.09052696e-03,
       -2.18890782e-04,  1.00828893e-02, -3.41025274e-03, -7.90948980e-03,
        1.34995291e-02, -1.61384158e-02,  1.17078237e-02, -9.16541275e-03,
       -1.75513830e-02, -6.62676664e-03,  3.92378494e-03, -3.49128507e-02,
        2.16235570e-03, -2.18007844e-02, -1.24644693e-02,  3.84444756e-05,
       -1.75388418e-02,  1.51604488e-02,  1.54674212e-02, -2.28953594e-03,
       -3.22622014e-03,  1.49715478e-02, -1.67071577e-02, -5.20567130e-03,
        2.68819258e-02, -5.03539480e-03, -1.53773348e-03,  8.47946946e-03,
       -1.47720352e-02, -2.52648704e-02, -1.11145917e-02,  5.55199152e-03,
        2.20587347e-02, -2.51965970e-03,  3.44922096e-02,  8.69484060e-03,
       -2.98640877e-03,  1.63232666e-02, -2.93789729e-02,  1.69698764e-02,
        2.63559702e-03,  2.02974994e-02,  3.98574537e-03, -1.47017895e-03,
        1.91256835e-03,  1.95443202e-02,  2.53328681e-02,  1.33868689e-02,
       -1.14411283e-02, -6.91669108e-03, -5.22566400e-03,  7.98663963e-03,
       -3.28458250e-02, -2.46275235e-02,  5.11894701e-03, -3.79799842e-03,
        1.36809060e-02,  8.68105330e-03,  5.87464590e-03,  1.87441967e-02,
        2.25151703e-02, -3.77139226e-02,  7.86265824e-03, -1.54119276e-03,
       -4.51707281e-03, -1.76458452e-02, -3.06492904e-03,  1.66806728e-02,
        8.52223311e-04, -6.27992325e-04,  9.21894796e-03, -1.14813698e-02,
        3.59651074e-02,  6.45991275e-03, -2.84711691e-03, -8.09630263e-04,
        1.15840277e-02,  1.10582998e-02,  9.24323685e-03, -6.82222378e-03,
       -3.73518351e-03, -4.67255153e-03, -1.58111043e-02,  1.92487054e-02,
        2.11668480e-03, -1.80824995e-02,  3.97482974e-04,  1.26969600e-02,
       -2.06453335e-02,  1.59024093e-02, -2.32979842e-02,  1.70211191e-03,
       -2.59287972e-02,  2.95025464e-02, -1.81309432e-02, -7.02514779e-04,
       -1.19952122e-02, -1.32992044e-02,  4.92850435e-04, -1.10130720e-02,
        1.18108829e-02,  3.58404852e-02,  5.47989691e-03, -9.34318174e-03,
       -9.20863729e-03,  2.08171736e-02,  1.23226466e-02, -7.08641578e-03,
        1.38170943e-02,  1.14179384e-02, -9.54045169e-03,  3.27164936e-03,
       -8.84726830e-03, -4.97072935e-03,  6.74474519e-03,  1.28790177e-03,
        1.90039407e-02,  2.81275692e-03, -1.15498798e-02,  1.81435645e-02,
       -1.20324651e-02, -2.51771673e-03,  1.41092287e-02,  2.46861670e-02,
       -4.75115189e-03,  2.48478912e-03,  8.23903829e-03, -1.79049447e-02,
        7.78101990e-03, -8.42549931e-03,  1.34704020e-02,  1.16594369e-02,
       -2.53247772e-03,  2.50339750e-02,  1.94632672e-02, -2.28211191e-02,
       -1.24190245e-02, -8.06612708e-03,  9.04691219e-03,  7.04061752e-03,
       -1.80579349e-02,  3.80228925e-03, -2.33203191e-02, -5.68851689e-03,
        1.75602809e-02,  7.58661516e-03,  1.28633680e-03,  3.78936976e-02,
       -2.14123987e-02, -5.71684679e-03, -1.02341883e-02, -5.15276706e-03,
        7.96789397e-03,  2.65804585e-03, -1.14104245e-03,  1.33778164e-02,
       -1.77189801e-02, -1.62457989e-04, -1.07245529e-02, -3.55782663e-03,
        3.81172867e-03, -2.81160162e-03, -7.36343395e-03, -1.02467211e-02,
       -1.25911934e-02, -1.01138297e-02, -4.35548415e-03,  1.57628506e-02,
       -1.40184222e-03, -3.51015781e-03,  8.48603249e-03, -6.07813569e-03,
        6.56607188e-03, -1.92225892e-02, -1.10122748e-03, -1.04854461e-02,
       -3.40366699e-02,  1.14742918e-02,  6.20316574e-03,  8.66728462e-03,
       -1.53687550e-02, -7.58477440e-03, -1.76020674e-02, -6.04429794e-03,
       -1.14499396e-02, -2.37502460e-03,  1.02339429e-03, -1.11884950e-02,
       -1.57796852e-02,  3.47329048e-03, -9.59737971e-03, -1.88846011e-02,
        2.27819337e-03, -1.28805600e-02,  2.82909395e-03,  1.88488234e-02,
       -3.69723095e-03,  3.37706413e-03,  8.83558474e-04,  1.99484266e-03,
        3.38274869e-04,  1.64738446e-02,  9.63088532e-04, -1.88980214e-02,
        6.25134958e-03,  2.84278183e-03, -5.03028976e-03,  1.34320129e-02,
        4.68744105e-03, -9.46035900e-04, -5.21621807e-03,  6.34021824e-03,
       -3.36548057e-03,  2.10409928e-02, -1.11700855e-02, -2.70251539e-02,
        1.43475067e-02, -1.69985869e-03,  1.60845425e-02,  1.34206498e-02,
       -8.38490762e-03, -1.24703823e-02,  1.18242484e-02,  6.46129670e-03,
       -1.06582977e-02, -1.38097843e-02,  1.69843286e-02,  5.22066979e-03,
        1.82993151e-02, -1.82312483e-03,  1.47590628e-02,  2.07619201e-02,
        6.48739468e-03, -7.73482770e-03,  4.18948056e-03,  1.47310807e-03,
        3.81267280e-03, -5.64462738e-03, -1.94946509e-02,  1.27252541e-03,
       -2.27777148e-03,  1.51581382e-02, -4.33815323e-04, -1.06804902e-02,
       -8.29666574e-03,  1.02714757e-02, -1.09922672e-02, -8.23284220e-03,
        4.46607126e-03, -2.36479263e-03, -1.40619669e-02, -3.69853666e-03])

Question 4.3.12: is the shape of manipulated word embedding correct ?

In [0]:
word_emb.shape
Out[0]:
(307894, 300)

Question 4.3.13: Are the input & output ready to be fed in the model ?

Yes.

Question 4.3.14: What does the input & output shape look like ?

In [0]:
print(f"Data Shape: {X.shape}")
print(f"Labels Shape: {Y.shape}")
Data Shape: (568454, 100)
Labels Shape: (568454,)

5. Model Development

Question 5.1: What type of classifier should I use traditional ones or deep learning models ?

Deep Learning model. Since the number of records is very high, a deep neural network is likely to perform much better than traditional ones.

Question 5.2: What type of Deep Learning Model is suitable for this task ? Plain Neural Network, Convolutional Neural Network or Recurrent Neural Network ?

This is a no-brainer. Since this task involves sequence data, a RNN model is likely to perform much better for this task.

Question 5.3: How many layers do you want to use ?

I don't know. I'll start with a basic model and iteratively change layers and other hyperparameters as needed.

In [0]:
def GetModel(shape):
  return Sequential([
      Embedding(word_emb.shape[0],word_emb.shape[1], input_length = seq_len, weights = [word_emb], trainable = False),
      LSTM(128, return_sequences=True, input_shape=(shape)),
      Dropout(0.2),
      BatchNormalization(),

      LSTM(128, return_sequences=True),
      Dropout(0.1),
      BatchNormalization(),

      LSTM(128),
      Dropout(0.2),
      BatchNormalization(),
      
      Dense(1, activation='sigmoid')
  ])
In [0]:
model = GetModel(X.shape[1:])
opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)
# Compile model
model.compile(
    loss="binary_crossentropy",
    optimizer=opt,
    metrics=["accuracy"]
)
NAME = f"Model-{int(time.time())}"
tensorboard = tf.keras.callbacks.TensorBoard(log_dir="logs/{}".format(NAME))
class_weights = {0:4,1:1} # Class Weights to address data imbalance issue
model.fit(
    X, Y,
    batch_size=64,
    epochs=5,
    validation_split=0.02,
    callbacks = [tensorboard],
    class_weight=class_weights
)
Epoch 1/5
8705/8705 [==============================] - 273s 31ms/step - loss: 0.5393 - accuracy: 0.8469 - val_loss: 0.2982 - val_accuracy: 0.8701
Epoch 2/5
8705/8705 [==============================] - 274s 31ms/step - loss: 0.4320 - accuracy: 0.8852 - val_loss: 0.2697 - val_accuracy: 0.8774
Epoch 3/5
8705/8705 [==============================] - 273s 31ms/step - loss: 0.3958 - accuracy: 0.8961 - val_loss: 0.3268 - val_accuracy: 0.8686
Epoch 4/5
8705/8705 [==============================] - 274s 31ms/step - loss: 0.3786 - accuracy: 0.9009 - val_loss: 0.2499 - val_accuracy: 0.8862
Epoch 5/5
8705/8705 [==============================] - 274s 31ms/step - loss: 0.3621 - accuracy: 0.9053 - val_loss: 0.2350 - val_accuracy: 0.8998

Question 5.4: Am I happy with model ?

Overall 90% accuracy on train and validation set is quite good if not great. But I would reserve my judgement until I do further analysis on model performance.

Question 5.5: Why are you saving the model ?

With Google colab you can access GPU only for a limited period and since it's a pain to train a RNN model using CPU, it is better we save the model and load it later when the runtime gets disconnected to analyse the model performance.

In [0]:
model.save("drive/My Drive/colab_files/Amazon_Reviews/model.h5")
In [0]:
model = load_model("drive/My Drive/colab_files/Amazon_Reviews/model.h5")

Question 5.6: what next ?

First I get prediction using model.predict method, then I convert the rowsx1 array into a vector matching the shape of the output vector, and then I save the predictions to drive so that I don't have to predict the output for 568k samples each time the runtime is disconnected.

In [0]:
pred = model.predict(X)
pred = np.reshape(pred, Y.shape)
np.save("drive/My Drive/colab_files/Amazon_Reviews/pred.npy",pred)
In [0]:
pred = np.load("drive/My Drive/colab_files/Amazon_Reviews/pred.npy", allow_pickle=True)

Question 5.7: What does the prediction output look like ?

It signifies the conditional probability P(Y = 1|X), which means the greater the probability the greater chance of the review being positive. if the output has probability greater than 0.5, then the inputted review is classified as a positive one and vice-versa.

In [0]:
pred[0]
Out[0]:
0.98604774

Question 5.8: is the shape of prediction array correct ?

In [0]:
pred.shape
Out[0]:
(568454,)

Question 5.9: Why am I rounding off probablilities?

Because our output has discrete values (0 and 1) and the network outputs the probability of p(y=1|x). I have to round it off to make it discrete. I use the cut-off probability as 0.5 i.e if the predicted probability is greater than 0.5 I will assign the output as 1 for that review and vice-versa.

In [0]:
pred_round = np.round(pred)

6. Model Analysis

Question 6.1: How many reviews are wrongly classified?

In [0]:
incorrect = Y[pred_round != Y]
incorrect.shape
Out[0]:
(45998,)
In [0]:
false_neg = sum(incorrect) # Calculating False Negatives
false_pos = len(incorrect) - false_neg # Calculating False Positives

Question 6.2: How many False Negatives?

In [0]:
false_neg
Out[0]:
37968

Question 6.3: How many False Positives?

In [0]:
false_pos
Out[0]:
8030

Question 6.4: What is the True Positive Rate / Sensitivity / Recall?

In [0]:
print(f"True Positive Rate: {round(1 - (false_neg/sum(Y==1)),4) * 100}")
True Positive Rate: 91.44

Question 6.5: What is the True Negative Rate / Specificity ?

In [0]:
print(f"True Negative Rate: {round(1 - (false_pos/sum(Y==0)),4) * 100}")
True Negative Rate: 93.56

Question 6.6: Is the model biased towards one class?

Since Sensitivity and Specificity of the model is quite similar, the model is not biased towards either of the class.

Question 6.7: Is the model overly predicting positive class(i.e. is the precision too low)?

No.

In [0]:
print(f"Precision: {round(1 - (false_pos/sum(pred_round==1)),4) * 100}")
Precision: 98.06
In [0]:
# Segregating review indices
incorrect_ind_neg = list(np.where((Y != pred_round) & (Y == 0))[0]) #indices of incorrectly predicted negative reviews
#print(incorrect_ind_neg)
incorrect_ind_pos = list(np.where((Y != pred_round) & (Y == 1))[0]) #indices of incorrectly predicted positive reviews
#print(incorrect_ind_pos)
correct_ind_neg = list(np.where((Y == pred_round) & (Y == 0))[0]) #indices of correctly predicted negative reviews
#print(correct_ind_neg)
correct_ind_pos = list(np.where((Y == pred_round) & (Y == 1))[0]) #indices of correctly predicted positive reviews
#print(correct_ind_pos)
In [0]:
df["seq_len_text"] = df.Text.apply(lambda x: (" ").join(x.split()[:seq_len])) # Setting up new column that matches with the input data to the model
df.head()
Out[0]:
Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time Summary Text seq_len_text
0 1 B001E4KFG0 A3SGXH7AUHU8GW delmartian 1 1 5 1303862400 Good Quality Dog Food i have bought several of the vitality canned d... i have bought several of the vitality canned d...
1 2 B00813GRG4 A1D87F6ZCVE5NK dll pa 0 0 1 1346976000 Not as Advertised product arrived labeled as jumbo salted peanut... product arrived labeled as jumbo salted peanut...
2 3 B000LQOCH0 ABXLMWJIXXAIN Natalia Corres "Natalia Corres" 1 1 4 1219017600 "Delight" says it all this is a confection that has been around a fe... this is a confection that has been around a fe...
3 4 B000UA0QIQ A395BORC6FGVXV Karl 3 3 2 1307923200 Cough Medicine if you are looking for the secret ingredient i... if you are looking for the secret ingredient i...
4 5 B006K2ZZ7K A1UQRSCLF8GW1T Michael D. Bigham "M. Wassir" 0 0 5 1350777600 Great taffy great taffy at a great price there was a wide... great taffy at a great price there was a wide ...
In [0]:
# Method to get Word Cloud given a dataframe
def GetWordCloud(df):
  text = ""
  for i in df.values:
    text = f"{text} {i[0]}"
  
  stopwords = set(STOPWORDS)
  wordcloud = WordCloud(width = 800, height = 800, 
                background_color ='white', 
                stopwords = stopwords, 
                min_font_size = 10).generate(text) 
  
  # plot the WordCloud image                        
  plt.figure(figsize = (8, 8), facecolor = None) 
  plt.imshow(wordcloud) 
  plt.axis("off") 
  plt.tight_layout(pad = 0) 
    
  plt.show()

# Method to sample reviews given indices & number of samples
def GetSampleReviewsWithDetails(ind, samples, col_indices = [-1, -5]):
  ind = sample(ind,samples) # sampling from indices provided
  for i in ind:
    print(df.iloc[i,col_indices[0]])
    print("\n")
    print(f"Rating - {df.iloc[i, col_indices[1]]} || Predicted Probability - {pred[i]}")
    print("\n")

Question 6.8: What does the correctly predicted positive reviews look like?

I cannot look into all reviews since the number is huge, therfore I sampled 3 values from the postive reviews which were correctly predicted as positive. All 3 review had a 5-star rating and going by the text nothing seems out of place.

In [0]:
GetSampleReviewsWithDetails(correct_ind_pos,3)
These are excellent chips, except Amazon couldn't get the order straight and twice sent me cases of Red Hot Blues. Great. What does a single person do with 2 cases of already flavored chips? Yep, give them away. I liked the Red Hots, but wanted these with no seasoning (I make my own salsas) for every day and Amazon just couldn't pull it off. The case was even clearly labelled. Go figure. Amazon did refund my money fully.


Rating - 5 || Predicted Probability - 0.8259775042533875


This was my first order using Amazon's subscriber service, and I couldn't be more pleased. This saves me going on a 50 mile round trip to the nearest "big" store (I live in a rural area). And with today's gas prices, what could be more convenient?<br /><br />The shipping was free, and the package arrived at my door within days. I will (and already have) be using this service more and more.<br /><br />Thanks Amazon!


Rating - 5 || Predicted Probability - 0.9966561794281006


I've been buying the Skippy Natural Peanut Butter for a few years, but once I found it on Amazon, I was a happy camper. Generally the best price anywhere and I love getting it in a larger package so I don't have to replace it quite as often. We use a lot of peanut butter around here!


Rating - 5 || Predicted Probability - 0.9987438917160034


Question 6.9: What are the most popular words in correctly predicted positive reviews?

Even though word cloud by itself is not much useful for a sequence learning task but it provides a very broad but intuivite way to show most frequent words in a corpus(distribution) by it's size. Some of the high occuring words and phrases such as "highly recommended" and "tastes great" perfectly reconciles with the distribution it's coming from.

In [0]:
GetWordCloud(df.iloc[correct_ind_pos,-1].to_frame())

Question 6.10: What does the distribution of rating look like for correctly predicted positive reviews?

In [0]:
GetRelativeFrequency(df.iloc[correct_ind_pos,:])

Question 6.11: What does the correctly predicted negative reviews look like?

Again looking at the 3 randomly sampled negative reviews everything is working as expected.

In [0]:
GetSampleReviewsWithDetails(correct_ind_neg,3)
be aware that you will receive only jar not if you order this item the picture showing jars is misleading


Rating - 2 || Predicted Probability - 0.0657813549041748


i found this coffee to be great at first but as i started having it more and more i came to detest it on the large setting the coffee is way too watery also the flavor is very very overpowering and kind of tastes like bananasit is a pretty good midafternoon or evening desert coffee but other than that it is kinda gross


Rating - 1 || Predicted Probability - 0.01786443591117859


the taste is great but it gave me a terrible migraine from the msg those who are sensitive beware also msg addition is usually a way to fake true flavor


Rating - 1 || Predicted Probability - 0.03576955199241638


Question 6.12: What are the most popular words in correctly predicted negative reviews?

In [0]:
GetWordCloud(df.iloc[correct_ind_neg,-1].to_frame())

Question 6.13: What does the distribution of rating look like for incorrectly predicted positive reviews?

In [0]:
GetRelativeFrequency(df.iloc[correct_ind_neg,:])

Question 6.14: What does the incorrectly predicted positive reviews look like?

I've sampled 10 incorrectly predicted positive review and provided my labels(i.e. how I feel about the sentiment) on each of this review and if the model performance is explainable or satisfactory. Review # is the order in which the reviews are outputted in below cell

# Rating Original Label My Label Predicted Probability Model Performance
1 4 Positive Incoherent, Mixed Sentiments 0.18 Satisfactory
2 5 Positive Mixed Sentiments 0.17 Satisfactory
3 4 Positive Mixed Sentiments 0.26 Satisfactory
4 4 Positive Mixed Sentiments 0.46 Satisfactory
5 4 Positive Suggestions 0.31 Satisfactory
6 5 Positive Mixed Sentiments 0.08 Satisfactory
7 5 Positive Suggestions 0.43 Satisfactory
8 4 Positive Positive Sentiments, Suggestions 0.32 Satisfactory
9 5 Positive Positive Sentiments 0.18 Not Satisfactory
10 4 Positive Mixed Sentiments 0.17 Satisfactory
In [0]:
GetSampleReviewsWithDetails(incorrect_ind_pos,10)
 organic dark chocolate chips organic sugar organic chocolate liquor organic cocoa butter organic soy lecithin added as an emulsifier organic vanilla   from the ingredients list bobos oat bars chocolateseveral weeks ago i reviewed bobos oat bars all natural banana ounce packages pack of  to which i awarded four stars based on its excellent oat flavor and mouth feel however it suffered the loss of a fifth star because the ostensible banana taste was so subtle as to be virtually nonexistentbecause i love chocolate i decided to give bobo another shot with the chocolate version of her oat snackdark chocolate is on the bars ingredients list  as chips no less buts its a complete mystery to me where the essence of chocolate is lurking moreover there isnt a chocolate chip in sight i pulled the bar apart and examined it carefully granted the bar has a darker shade of color than the other ones of the brand ive tried so that at least implies chocolate is present and theres a faint nuance of some other flavor underlying the taste of oats but if anything its suggestive of molassesmind you i truly like these bars for the reasons previously stated and ill be sure to enjoy the remaining eleven of the dozen chocolate ones i purchased but as near as i can determine to date with the three types ive eaten to claim theres any flavor present other than that of oats is a complete fiction and ive had to resist the impulse to award three stars or less for blatantly false advertising


Rating - 4 || Predicted Probability - 0.18392610549926758


i like this blend but the price is ridiculous found them at winn dixie and other flavors  for a pack of 


Rating - 5 || Predicted Probability - 0.16662359237670898


the product was nice but there were no tips or recipes on how to use each ingredient nonetheless it was a great buy


Rating - 4 || Predicted Probability - 0.2598097324371338


these are light cookies with nice flavor  not overpowering but you definitely get the citrus in the background  i found them a bit on the dry side but my husband loves them


Rating - 4 || Predicted Probability - 0.4605185389518738


tastes fine cooks up well only improvement would be to fix what may be a machinery glitch that produces extra tiny pieces after cooking dont see them when i pour the uncooked pasta into the water but once boiled it looks as if someone has sprinkled in some short lengths of vermicelli which i guess are some end or side shavings of the corkscrew pasta tastes no different but they are a little added inconvenience because some can end up out of the colander accumulating down at the bottom of the sink etc so if someone can just adjust the pasta cutter extruder or whatever it is called or fix what is causing the problem this one small glitch in an otherwise fine product will be solved


Rating - 4 || Predicted Probability - 0.31887829303741455


my dogs love these and keep them happy for hours  i had a problem with the third party supplier though  i ordered  of them and they only sent six labeled as moo bully sticks the rest were unwrapped unlabeld bully sticks  i order only moo sticks as i trust them to be free range i did not like getting unlabeled bully sticks i should have gotten all of them as pictured  the vendor said they were exactly the same and said that some of their customers like them unwrapped  really  i have since found a different trusted source for my bully sticks


Rating - 5 || Predicted Probability - 0.08764034509658813


the glucotest really works i wanted an alternative to sticking my diabetic cats ears to test his glucose levels he was getting inconsistent readings at the vet because he was too stressed out there i took him off all grain fed him canned meat catfood then switched to the raw frozen meat diet to successfully use the  purina glucotest confetti make sure you mix it thoroughly into your cats litter so it is all submerged any pieces floating on top i found did not give an accurate readingthe pieces that were submerged showed consistent readings and were verified by a recent blood test done at the vets when he was calm and not stressed i checked the pieces by scooping up the clump of pee from litterbox and placing it on a piece of newspaper that gave me room to pick out submerged confetti pieces with a pair of tweezers that sounds wierd but i am so pleased to be able to help my cat get healthy he is stable now and getting better


Rating - 5 || Predicted Probability - 0.4285757839679718


the cookies have lighter wafers than other creme filled cookies ive tried but if you like a lot of creme in between the wafers these cookies may not be for you although i do like the flavor of all of the enclosed varieties


Rating - 4 || Predicted Probability - 0.3157707452774048


its like a sticky softer brown rice hard to believe its actually brown rice wont eat any other rice now


Rating - 5 || Predicted Probability - 0.18326765298843384


i like that they are natural and made in the usa but they dont have a lot of stuff in the middle as advertised and my dogs prefer the larger  to  ones


Rating - 4 || Predicted Probability - 0.1711539626121521


Question 6.15: What are the most popular words in incorrectly predicted positive reviews?

In [0]:
GetWordCloud(df.iloc[incorrect_ind_pos,-1].to_frame())

Question 6.16: What does the distribution of rating look like for incorrectly predicted positive reviews?

In [0]:
GetRelativeFrequency(df.iloc[incorrect_ind_pos,:])

Question 6.17: What does the incorrectly predicted negative reviews look like?

I've sampled 10 incorrectly predicted negative review and provided my labels(i.e. how I feel about the sentiment) on each of this review and if the model performance is explainable or satisfactory. Review # is the order in which the reviews are outputted in below cell, you can read the review there.

# Rating Original Label My Label Predicted Probability Model Performance
1 3 Negative Negative Sentiments, Criticism 0.86 Not Satisfactory
2 3 Negative Mixed Sentiments, Edited Review 0.56 Satisfactory
3 3 Negative Positive Sentiments, Usage Caution 0.77 Satisfactory
4 3 Negative Positive Sentiments 0.69 Satisfactory
5 3 Negative Postive Sentiments 0.77 Satisfactory
6 3 Negative Mixed Sentiments, Suggestion 0.62 Satisfactory
7 3 Negative Positive Sentiment, Negative Prediction 0.88 Satisfactory
8 3 Negative Positive Sentiments, Incoherent 0.88 Satisfactory
9 2 Negative Negative Sentiments, Positive Expectations 0.69 Satisfactory
10 3 Negative Positive Sentiments 0.81 Satisfactory
In [0]:
GetSampleReviewsWithDetails(incorrect_ind_neg,10)
this is one serving please stop trying to make it seem more healthful than it is it is very tasteful but is not as low fat as it seems to be trying to be when i cannot trust you whos next  colp


Rating - 3 || Predicted Probability - 0.8607024550437927


i like envirokidz crispy rice bars but these peanut choco ones are the best  they have actual bits of peanut mixed in which gives them a nice little crunch  one caveat  although these bars dont taste overly sweet they do have g of sugar each  health food theyre not  but they are a good substitute for candy bars which tend to be much higher in fat and sugaramazons price is the best ive found thus far especially if you do the subscribe  save optionlater edit  just downgraded my rating because the quality of the chocolate drizzle has really gone downhill  theyve also gotten gooier which i personally find unappealing


Rating - 3 || Predicted Probability - 0.5625847578048706


i have three geman shepherds and they love the chews  they have done a great job of cleaning their teeth  just one caution i would not leave a dog alone with a chew they have a tendency to shallow large portions which could be a choking hazard


Rating - 3 || Predicted Probability - 0.7699354887008667


i managed to get a bag rather inexpensively at a flea market cat food scratch and dent sale since my cats are rescued feral cats they eat what is on sale they did tear into the food as soon as it hit the dish and seemed to enjoy it in fact they enjoyed it so much they threw it up all over my carpet so they could go eat some more next week they will be getting something else they seem to like iams which was grannys favorite


Rating - 3 || Predicted Probability - 0.6854910254478455


i was impressed by the spicy not too hot not too spicy its a pretty cheap product so dont expect much flavor wise but if your looking for something to sustain you go right on aheed and dive into a bouwl of nissin hot and spicy chicken flavored noodles


Rating - 3 || Predicted Probability - 0.7714532613754272


bobs red mill gluten free brownies bobs red mill glutenfree brownie mix ounce packages pack of  are to die for these cookies not so much the brownies are so rich and chocolatey that i raced back to amazon and ordered these cookies there arent many chips and the ones that are there look like chocolate chips but they donr taste like them i gave it  stars because this does make an easy base for glutendairy free cookies just add more of your own dairy free choc chips the choc in the brownies is the premium ghiradelli chocolate so bob make the chips in these cookies out of the ghiradelli too and youd really have something


Rating - 3 || Predicted Probability - 0.6232295036315918


this cocoa dark chocolate bar from newmans own is tasty enough picky chocolate lovers might find it too sweet but it whacks the craving as far as im concerned i just find it way too hard its a quarter inch thick so its a solid chunk of chocolate when you bite into it or break off a piece ive been putting it in the microwave for a few seconds to soften it but i can see disaster looming there


Rating - 3 || Predicted Probability - 0.8887010812759399


this item came up on amazon at the right time i was getting tired od s seeds and was craving pumpkin seeds


Rating - 3 || Predicted Probability - 0.8849270343780518


normally i wouldnt spring so much money for chocolate but being that these turinmade bars were filled with liquorflavored ganache and came in a jumbo size i figured it would be worth itha well color my face red turin did something very cute with these socalled jumbo bars kind of like how snack companies will puff their jumbosized bags full of air to make them look like they have more chips than they really contain you see these bars are definitely larger than average size in terms of length and width but their thickness is practically paper thin were talking maybe about as thin as andes candy maybe even a thinner than that so they probably have no more volume than a hershey bar that has been spread out really thinlyas for the taste absolutely nothing mind blowing to justify the price just sort of meh i cant even honestly say that i tasted any of the flavors either on first bite id get a mild hint of baileys or kahlua but then nothing much afterwards just a blamd sweetened filling the chocolate too was nothing great kind of mediocre so needless to say id definitely pass on these the packaging looks nice the bars are huge and the flavors exotic but the product is average and overpriced


Rating - 2 || Predicted Probability - 0.6920400857925415


these have pretty good flavor which doesnt disappear after the first minute they are stickier than charms sour balls or lifesavers but they are each individually wrapped so they dont stick to each other and half a roll doesnt grab all your pocket lint


Rating - 3 || Predicted Probability - 0.8082432746887207


Question 6.18: What are the most popular words in incorrectly predicted negative reviews?

In [0]:
GetWordCloud(df.iloc[incorrect_ind_neg,-1].to_frame())

Question 6.19: What does the distribution of rating look like for incorrectly predicted negative reviews?

In [0]:
GetRelativeFrequency(df.iloc[incorrect_ind_neg,:])

Question 6.20: What does the distribution of predicted probablities look like for incorrectly and correctly predicted positive reviews?

In [0]:
sns.distplot(pred[incorrect_ind_pos], hist=True, kde=True, 
             bins=int(180/5), color = 'darkblue', 
             hist_kws={'edgecolor':'black'},
             kde_kws={'linewidth': 4},
             axlabel = "Predicted Probabilities")

sns.distplot(pred[correct_ind_pos], hist=True, kde=True, 
             bins=int(180/5), color = 'darkorange', 
             hist_kws={'edgecolor':'black'},
             kde_kws={'linewidth': 4},
             axlabel = "Predicted Probabilities")
Out[0]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f121336d4a8>

Question 6.21: What does the distribution of predicted probablities look like for incorrectly and correctly predicted negative reviews?

In [0]:
sns.distplot(pred[incorrect_ind_neg], hist=True, kde=True, 
             bins=int(180/5), color = 'darkblue', 
             hist_kws={'edgecolor':'black'},
             kde_kws={'linewidth': 4},
             axlabel = "Predicted Probabilities")

sns.distplot(pred[correct_ind_neg], hist=True, kde=True, 
             bins=int(180/5), color = 'darkorange', 
             hist_kws={'edgecolor':'black'},
             kde_kws={'linewidth': 4},
             axlabel = "Predicted Probabilities")
Out[0]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1210cd94a8>

Question 6.22: What does the distribution of predicted probablities look like for different ratings?

Rating Comments
1 Expectedly approximates exponential distribution with a sharp slope , nothing out of sorts.
2 Expectedly approximates exponential distribution with a slightly less sharper slope, nothing out of sorts
3 In practical these are neutral reviews but for this analysis I set these up as negative reviews, it is likely the network
shifted the neutral sentiments towards zero to help itself classify these as negative review. This is explained by a
slightly less sharper drop in the slope, incomaprison with predicted probabilities for ratings 1 & 2. It is likely that
neutral reviews with negative undertones ended up having probabilities close to zero and neutral reviews with mixed
sentiments shifted from the 0.4 - 0.5 range to 0.10 -0.3 range and like wise neutral reviews with positive sentiments
probably shifted from 0.5 - 0.65 range to 0.3 - 0.5 range.
4 Probability distribution of predicted probablities for rating 4 expectedly follows negative exponential distribution, but the
slope of the curve isn't as sharp as I'd want to be. It should be noted that around half of the wrongly predicted reviews were
of rating 4, whereas proportion of reviews with rating 4 contributing to positive sentiments block was a mere ~21%. I
cannot hypothesize what could be the reason behind this until I go through a substantial chunk of rating 4 reviews with low
predicted probability.
5 Expectedly approximates negative exponential distribution with a sharpe slope , nothing out of sorts

Overall everything seems to be working fine except for reviews with Rating 4, the number of the incorrect reviews is highly disproportionate.

In [0]:
f, axes = plt.subplots(2, 3, figsize=(12, 12))
axes[1][2].set_axis_off()
for i,r,c in [[1, [0,0],"red"],[2, [0,1],"orange"],[3, [0,2],"blue"],[4, [1,0],"yellow"],[5, [1,1],"green"]]:
  sns.distplot(pred[df.Score == i], hist=True, kde=True, 
             bins=int(180/5), color = c, 
             hist_kws={'edgecolor':'black'},
             kde_kws={'linewidth': 4},
             axlabel = f"Predicted Probabilities for Rating - {i}",
             ax=axes[r[0], r[1]])

Question 6.22.1: Are reviews with rating 4 have disproportionately high incorrect predictions ?

Yes.

In [0]:
df["pred_prob"] = pred
df["predicted_correctly"] = pred_round == Y
sns.countplot(x = "Score", hue = "predicted_correctly", data = df)
Out[0]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fd26d6d1ef0>

Question 6.22.2: What does the review with rating 4 and predicted probability less than 0.11 look like ?

Sampling 5 reviews from the said distribution

# Rating Original Label My Label Predicted Probability Model Performance
1 4 Positive Mixed Sentiments, Comparison 0.04 (too low) Not Satisfactory
2 4 Negative Mixed Sentiments, Suggestion, Expectations 0.06 Satisfactory
3 4 Positive Negative Sentiments 0.04 Satisfactory
4 4 Positive Negative Sentiments, Criticism 0.08 Satisfactory
5 4 Positive Mixed Sentiments, Negative Inclination 0.015 Satisfactory
In [0]:
index = list(df[(df.pred_prob < 0.11) & (df.Score == 4)].index)
GetSampleReviewsWithDetails(index,5, col_indices=[-3, -7])
i was surprised at how soft the cookie was i usually buy little debbies cookies and i noticed the quaker soft baked oatmeal cookie was bigger and had a bold taste


Rating - 4 || Predicted Probability - 0.10463613271713257


a while back i ordered bottles of the torani syrup i think it was the caramel and i really like the flavor but it is like water so be prepared when you start using it the only other thing i would like would be to be able to choose different flavors instead of all flavors being the same


Rating - 4 || Predicted Probability - 0.06538110971450806


i keep buying puzzle toys for my border collie mix with the hope of finally finding one that will take longer than minutes for him to figure out i had high hopes for the everlasting treat ball after reading so many good reviews however when i gave him the ball with the treat in it he popped the treat out in about minutes i tried flipping the treat over so it points in as suggested by other reviewers and that lasted another minutes or so then i jammed it completely inside the ball that bought me another minutes i eventually


Rating - 4 || Predicted Probability - 0.04892462491989136


amazon indicates that the normal price for the lbs of dog food is about and that they are selling it for about i think they need to do some proof reading before placing product online


Rating - 4 || Predicted Probability - 0.08693265914916992


i liked these they were a bit spicy but a slice of chedder melted on top turned them into a very high protein nachothe lemon herb were horrible smelled musty and stale poor combination of herbs with the lemon they became expensive dog treatswould buy againbut never the lemon herbyuk


Rating - 4 || Predicted Probability - 0.0152207612991333


Question 6.22.3: How are words distributed for reviews with rating 4 and predicted probability less than 0.11 ?

As explained above one can't read much from the word cloud but even a highre level presence of words such as "disappointed, prefer, hard, though" logically reconciles with the low predicted probability.

In [0]:
GetWordCloud(df[(df.pred_prob < 0.11) & (df.Score == 4)].seq_len_text.to_frame())

Question 6.22.4: What does the review with rating 4 and predicted probability between 0.11 - 0.21 look like ?

Sampling 5 reviews from the said distribution

# Rating Original Label My Label Predicted Probability Model Performance
1 4 Positive Negative Sentiments, Incoherent 0.20 Satisfactory
2 4 Positive Negative Sentiments, Comparison 0.15 Satisfactory
3 4 Negative Mixed Sentiments, Negative Inclination 0.16 Satisfactory
4 4 Positive Mixed Sentiments, Brief 0.20 Satisfactory
5 4 Positive Negative Sentiments 0.13 Satisfactory
In [0]:
index = list(df[(df.pred_prob >= 0.11) & (df.pred_prob < 0.21) & (df.Score == 4)].index)
GetSampleReviewsWithDetails(index,5, col_indices=[-3, -7])
not a really drinker dont keep booze in the house that said every once in a while i get in the mood for a libation always liked a long island iced tea but to make one requires buying a liquor stores worth of clears gave this a shot for like bucks in a big liter bottle totally on a whim when out to pick up a red wine for guests how awful could it be no worse than any of the other premade cocktails which is to say horrible ive had figured if it was terrible i could mix it


Rating - 4 || Predicted Probability - 0.2014760673046112


i love honey nut cheerios one of my favorite cereals of all time so i was intrigued by the new crunchy nut roasted nut and honey from kellogs i tried it first as dry cereal and was less than impressed the cereal is corn based so it had the aftertaste similar to corn pops cereal after the honey flavor wore off i believe cheerios are whole grain and oats so there is a very distinct taste differenceafter trying them with milk i am much more impressed with the cereal they stay crispy for a long time much longer than cheerios


Rating - 4 || Predicted Probability - 0.15157657861709595


i am a puerh tu cha addict when amazon stopped carrying the rishi brand i ordered this it isnt the same this is a loose leaf tea for some reason there are a lot of stemstwigs in the mix last i checked i was looking for full leaf if you can get it do try the rishi brand the tu cha is a bowl tea it is individually wrapped in little bowl shapes inside a canister what i do like about the serendipitea brand is the packaging inside the cardboard box is a lined paper bag that you can reclose


Rating - 4 || Predicted Probability - 0.1608629822731018


i think this is a great product but my picky dog gets tired of it after a week


Rating - 4 || Predicted Probability - 0.1973532736301422


its good for your health for sure but the taste is not delicious as advertised lol it tastes kind of badif you are planning on doing salad dressing with it its ok but this you cant use it plainunless you dont mind the taste lol


Rating - 4 || Predicted Probability - 0.12896999716758728


Question 6.22.5: How are words distributed for reviews with rating 4 and predicted probability between 0.11 - 0.21 ?

Nothing much to make sense of here, again advocating why it would be counter-productive to assign significant value to word counts in a sequence learning task.

In [0]:
GetWordCloud(df[(df.pred_prob >= 0.11) & (df.pred_prob < 0.21) & (df.Score == 4)].seq_len_text.to_frame())

Question 6.22.6: What does the review with rating 4 and predicted probability between 0.21 - 0.31 look like ?

# Rating Original Label My Label Predicted Probability Model Performance
1 4 Positive Negative Sentiments 0.26 Satisfactory
2 4 Positive Mixed Sentiments, Caution 0.15 Satisfactory
3 4 Negative Mixed Sentiments, Breif 0.27 Satisfactory
4 4 Positive Mixed Sentiments, Negative Inclination 0.27 Satisfactory
5 4 Positive Mixed Sentiments 0.27 Satisfactory
In [0]:
index = list(df[(df.pred_prob >= 0.21) & (df.pred_prob < 0.31) & (df.Score == 4)].index)
GetSampleReviewsWithDetails(index,5, col_indices=[-3, -7])
i absolutely love mallomars and they are hard to find especialy in california every once in a while one or two stores will have a limited supply so i was thrilled to find them on amazon and ordered right away thhey were good but a little stale i will ceratinly try again as they are a little additive ak


Rating - 4 || Predicted Probability - 0.2583104968070984


these are tasty and light but on the smallish side they are for sure meant as a snack not as a meal replacement the chocolatepeanut butter combination is great though nothing terribly different than a lot of products already out there to tell the truth i do like the the chocolate layer on the bottom it makes it seem more like a candy bar


Rating - 4 || Predicted Probability - 0.21824103593826294


good flavor but not real sweet i add a little stevia for my sweet tooth


Rating - 4 || Predicted Probability - 0.2726106643676758


like other reviewers already stated the directions for how much ho to add is waaaaay off following the directions will yield a gummy clump the directions say to add cup of ho when i make it i add two cups of soy milk vegan butter this creates a thick creamy sauce the sauce is okay but still a bit bland for my taste i also add garlic herbs and some nutritional yeast it is delicious on its own this product sucks needs lots of extras but provides a good base for my vegan cream sauce


Rating - 4 || Predicted Probability - 0.2713000774383545


my dog will do almost anything for these treats they are a little messy since they are basically compressed balls of flaky freeze dried meat so the bottom of the treat bag is just full of meat dust essentiallynot so easy to use for training however murphy loves them so i will keep buying them also we got a bag that smelled a little wierd i called the company directly and they said that the treats were most likely fine but were just oxidized however still told me to take them to any store that carries their brand and that


Rating - 4 || Predicted Probability - 0.27170467376708984


Question 6.22.7: How are words distributed for reviews with rating 4 and predicted probability between 0.21 - 0.31 ?

Slightly different from previous distribution.

In [0]:
GetWordCloud(df[(df.pred_prob >= 0.21) & (df.pred_prob < 0.31) & (df.Score == 4)].seq_len_text.to_frame())

Question 6.22.8: What does the review with rating 4 and predicted probability between 0.31 - 0.41 look like ?

Sampling 5 reviews from the said distribution

# Rating Original Label My Label Predicted Probability Model Performance
1 4 Positive Mixed Sentiments, Brief 0.35 Satisfactory
2 4 Positive Positive Sentiments, Slight Incoherence 0.34 Not Satisfactory
3 4 Negative Mixed Sentiments 0.37 Satisfactory
4 4 Positive Mixed Sentiments, Brief 0.36 Satisfactory
5 4 Positive Positive Sentiments, Suggestions 0.39 Satisfactory
In [0]:
index = list(df[(df.pred_prob >= 0.31) & (df.pred_prob < 0.41) & (df.Score == 4)].index)
GetSampleReviewsWithDetails(index,5, col_indices=[-3, -7])
not quite as tasty as the made in nature dried plums they are very good and a little less expensive


Rating - 4 || Predicted Probability - 0.35355478525161743


i absolutely am in love with these jelly babies do not hesitate at all to buy these jelly babies i purchased them just like everyone else i am a fan of doctor who and i was curious now i know why the fourth doctor was obsessed with them i have been ordering them one bag at a time now for a little bit the most recent time i decided to buy two bags it would have been a brilliant decision if it wasnt degrees out the day they were delivered needless to say my jelly babies were basically justjelly it


Rating - 4 || Predicted Probability - 0.34572815895080566


this is about as healthty as you can get when it comes to snacks i like to dip them in almond butter for some added protein i would rate them a five but like all the marys products they are fragile and you end up with too many small pieces in the bottom of the bag


Rating - 4 || Predicted Probability - 0.3703000843524933


this is an excellent productnot the best but certainly worth a try not as creamy as some but the flavor is there a little dry


Rating - 4 || Predicted Probability - 0.36491817235946655


this is a very tasty product and represents a quick easy meal to prepare the directions are written in german which i happen to read it would be helpful if this was disclosed in the amazon product description for the benefit of nongerman speakers another alternative would be to post an english translation of the instructions to a web site and to enclose said web address to the amazon listing just a thought


Rating - 4 || Predicted Probability - 0.39312514662742615


Question 6.22.9: How are words distributed for reviews with rating 4 and predicted probability between 0.31 - 0.41 ?

Not much different from previous distribution. Again not much useful.

In [0]:
GetWordCloud(df[(df.pred_prob >= 0.31) & (df.pred_prob < 0.41) & (df.Score == 4)].seq_len_text.to_frame())

Question 6.22.10: What does the review with rating 4 and predicted probability between 0.41 - 0.50 look like ?

Sampling 5 reviews from the said distribution

# Rating Original Label My Label Predicted Probability Model Performance
1 4 Positive Mixed Sentiments, Positive Inclination 0.49 Satisfactory
2 4 Positive Mixed Sentiments, Suggestions 0.44 Satisfactory
3 4 Negative Positive Sentiments, Suggestion, Brief 0.45 Satisfactory
4 4 Positive Mixed Sentiments, Positive Inclination 0.43 Satisfactory
5 4 Positive Mixed Sentiments 0.42 Satisfactory
In [0]:
index = list(df[(df.pred_prob >= 0.41) & (df.pred_prob < 0.50) & (df.Score == 4)].index)
GetSampleReviewsWithDetails(index,5, col_indices=[-3, -7])
mike and ike lovers have the chance to sample some exciting new fruit flavors in this special seasonal mummys mix box a combination of five fruit flavors in the same box lemon orange grape lime and raspberry fruit flavors are authentic with overall sweetness and just a touch of sourness to make your mouth water fresh and chewy we would have rated this stars but the fact that the box is not resealable is a negative factor but then maybe you are supposed to finish off all ounces while enjoying the movie with no need to seal the box closed


Rating - 4 || Predicted Probability - 0.49411576986312866


i got these because of their price primarily taste is great picks me up a bit but really doesnt do much i can drink two and not feel that jittery keep in mind i drink lots of caffeinated beverages so i have a tolerancei wish the extra strength came in packs of at this price that would be awesomethis is a great product but for a caffeine junkie like me its not up to snuff


Rating - 4 || Predicted Probability - 0.43978574872016907


the lemon flavored wafers are really good but the strawberry are bestim going to ask amazoncom to get them on the subscribe save programj


Rating - 4 || Predicted Probability - 0.45176970958709717


this yeast is cheaper and good compared to other yeastsbut i bought the same ounce bag for at smart final store


Rating - 4 || Predicted Probability - 0.4340389370918274


i add cinnemon andor ginger to my ground coffee its suppose to be medicinal however my taste buds must be shriveled up because i cant tast the difference in different brands of gingermy cousin gave me some from penskys and no difference here so for me this was a good price but no difference is taste for me i do like the brand


Rating - 4 || Predicted Probability - 0.4220597743988037


Question 6.22.11: How are words distributed for reviews with rating 4 and predicted probability between 0.41 - 0.50 ?

Very similar to earlier distributions, hence not much useful

In [0]:
GetWordCloud(df[(df.pred_prob >= 0.41) & (df.pred_prob <= 0.50) & (df.Score == 4)].seq_len_text.to_frame())

Question 6.23: How satified am I with the model performance on the 25 sampled incorrectly predicted 4-star rating reviews ?

In [0]:
metric = ("Satisfactory " * 23).split()
metric.extend(("Not-Satisfactory " * 2).split())
sns.countplot(metric)
Out[0]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fd26083c940>

Question 6.24: How are my labels (i.e User Defined / adjudged) distributed for the 25 sampled incorrectly predicted 4-star rating reviews ?

In [0]:
f, axes = plt.subplots(figsize=(24, 6))
labels = ("Mixed-Sentiments " * 15).split() #
labels.extend(("Negative-Sentiments " * 6).split()) #
labels.extend(("Brief " * 5).split()) #
labels.extend(("Suggestion " * 4).split()) #
labels.extend(("Negative-Inclination " * 3).split()) #
labels.extend(("Positive-Sentiments " * 2).split()) #
labels.extend(("Positive-Inclination " * 2).split()) #
labels.extend(("Comparison " *2).split()) #
labels.extend(("Incoherent " * 2).split()) #
labels.extend(("Caution " * 1).split()) #
labels.extend(("Expectation " * 1).split()) #
labels.extend(("Criticism " * 1).split()) #
sns.countplot(labels)
Out[0]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fca8c4618d0>

Question 6.25: Can I proceed with the model ?

The Analysis of 25 sampled incorrectly predicted reviews with ratings 4 suggests that model is doing well even where it's not expected to do well. Yes I should proceed with the model.

7. Model Test

In this section I will test the model by tasking it with previously unseen data from the web.

In [0]:
# Converting reviews to network feedible format
def GetSingleSequence(sentence):
    sentence = sentence.lower()
    #Removing html tags
    sentence = re.compile(r'<[^>]+>').sub('', sentence)
    #Removing Punctuations
    sentence = re.sub(r'[^\w\s]','',sentence)
    words = sentence.split()[:seq_len]
    sequence = [[vocab[x].index] if x in vocab.keys() else [len(vocab)] for x in words]
    sequence = PadSequence(sequence)
    X = np.array(sequence).reshape(1,seq_len)
    return X

Question 7.1: How does the model perform on lengthy positive review with overwhelmingly positive sentiments?

Review: These 1-min oats are a life saver. I love that it’s so practical to use. Usually in the morning I just pour in the amount I want and then add just some milk and put in microwave. While the coffee is brewing, it’ll be done at the time. I love that it doesn’t have any additives in it so I can add whatever I want depending on my mood. Some days I’ll add berries and banana, other days I’ll add chocolate chips and nuts or cinnamon and honey would do. I noticed that with a full bowl of these oats, my tummy is satisfied for longer so I cut down on eating unnecessary items in between.

Rating: 5

Source

In [0]:
review = "These 1-min oats are a life saver. I love that it’s so practical to use. Usually in the morning I just pour in the amount I want and then add just some milk and put in microwave. While the coffee is brewing, it’ll be done at the time. I love that it doesn’t have any additives in it so I can add whatever I want depending on my mood. Some days I’ll add berries and banana, other days I’ll add chocolate chips and nuts or cinnamon and honey would do. I noticed that with a full bowl of these oats, my tummy is satisfied for longer so I cut down on eating unnecessary items in between."
print(f"{model.predict(GetSingleSequence(review))[0][0]} - Pretty Good !!! ")
0.9730412364006042 - Pretty Good !!! 

Question 7.2: How does the model perform on short positive review with overwhelmingly positive sentiments?

Review: I CAN'T STOP EATING FLAMIN HOT CHEETOS, they're my weakness, idc the presentation, I love them in every way. They have the perfect spicy flavor.

Rating: 5

Source

In [0]:
review = "I CAN'T STOP EATING FLAMIN HOT CHEETOS, they're my weakness, idc the presentation, I love them in every way. They have the perfect spicy flavor. "
print(f"{model.predict(GetSingleSequence(review))[0][0]} - Awesome !!! ")
0.994418740272522 - Awesome !!! 

Question 7.3: How does the model perform on medium length review with mixed sentiments?

Review: "Love Cheerios, especially the frosted kind. Only reason for 4 stars instead of 5 is the sugar content. They don't really contain much nutritional value for a complete breakfast but they sure are tasty."

Rating: 4

Source

In [0]:
review = "Love Cheerios, especially the frosted kind. Only reason for 4 stars instead of 5 is the sugar content. They don't really contain much nutritional value for a complete breakfast but they sure are tasty."
print(f"{model.predict(GetSingleSequence(review))[0][0]} - Great !! ")
0.40862926840782166 - Consistent !! 

Question 7.4: How does the model perform on medium length review with mixed sentiments with negative inclination?

Review:

I tried these to compare with Frosted Flakes. They were just ok, I still prefer Honey Nut. Bought at Kroger and they did have several Cheerio choices. Will keep buying the honey nut and would suggest others to the same!

Rating: 3

Source

In [0]:
review = "I tried these to compare with Frosted Flakes. They were just ok, I still prefer Honey Nut. Bought at Kroger and they did have several Cheerio choices. Will keep buying the honey nut and would suggest others to the same!"
print(f"{model.predict(GetSingleSequence(review))[0][0]} - Good !! ")
0.0421736016869545 - Good !! 

Question 7.5: How does the model perform on lengthy review with overwhelmingly Negative sentiments?

Review: These are wayyy too sugary for me. With all the added sugars, I don't really think these can even still be considered healthy. But since it's under the Cheerios name, many will just associate this with a healthy morning option. Definitely not a great start to your day, unless you were planning on working a sugar crash into your busy schedule. I never actually buy this cereal directly, I always end up getting it when I purchase those mini cereal variety boxes, and then I eat it at 2 a.m. when I am truly desperate for a snack. These are wayyy too sugary for me. With all the added sugars, I don't really think these can even still be considered healthy. But since it's under the Cheerios name, many will just associate this with a healthy morning option. Definitely not a great start to your day, unless you were planning on working a sugar crash into your busy schedule. I never actually buy this cereal directly, I always end up getting it when I purchase those mini cereal variety boxes, and then I eat it at 2 a.m. when I am truly desperate for a snack.

Rating: 2

Source

In [0]:
review = "These are wayyy too sugary for me. With all the added sugars, I don't really think these can even still be considered healthy. But since it's under the Cheerios name, many will just associate this with a healthy morning option. Definitely not a great start to your day, unless you were planning on working a sugar crash into your busy schedule. I never actually buy this cereal directly, I always end up getting it when I purchase those mini cereal variety boxes, and then I eat it at 2 a.m. when I am truly desperate for a snack. These are wayyy too sugary for me. With all the added sugars, I don't really think these can even still be considered healthy. But since it's under the Cheerios name, many will just associate this with a healthy morning option. Definitely not a great start to your day, unless you were planning on working a sugar crash into your busy schedule. I never actually buy this cereal directly, I always end up getting it when I purchase those mini cereal variety boxes, and then I eat it at 2 a.m. when I am truly desperate for a snack."
print(f"{model.predict(GetSingleSequence(review))[0][0]} - Pretty Good !!! ")
0.08640038222074509 - Pretty Good !!! 

Question 7.6: How does the model perform on short negative review with overwhelmingly negative sentiments?

Review: "I don't think these are good at all. Someone recommend these to me to try. I will not buy these ever.."

Rating: 1

Source

In [0]:
review = "I don't think these are good at all. Someone recommend these to me to try. I will not buy these ever.."
print(f"{model.predict(GetSingleSequence(review))[0][0]} - Excellent !!! ")
0.02735918201506138 - Excellent !!! 

8. Transfer Learning

In this section I am exploring the possibility of using learned weights from the model to predict sentiments in different problem domains.

8.1 Restaurant Reviews

Question 8.1.1: How does the model perform on positive review ?

Review: I love to treat myself for lunch here! Sandwiches are 5.99, drinks 1, combo 9.99 and that includes two sides and a drink (pop or juice). It's also delicious and the staff are kind. I almost always get a combo with potatoes, salad and the falafel wrap. It's very filling! The potatoes are roasted and seasoned wonderfully and they can put a creamy garlic sauce over them. The salad is spiced with vinegar and roasted thyme. YUM! The falafel is perfect! Crunchy outside, soft inside, seasoned, it's great. The rice is the only thing I'm not excited about. It doesn't have as strong a flavor as I would like but if you like a blander taste you will probably enjoy it. Overall, very affordable and at a great cost. I could not recommend this place more.

Rating: 5

Source - Read Fushcia H.'s review of Al-Madina Market & Grill on Yelp

In [0]:
review = "I love to treat myself for lunch here! Sandwiches are 5.99, drinks 1, combo 9.99 and that includes two sides and a drink (pop or juice). It's also delicious and the staff are kind. I almost always get a combo with potatoes, salad and the falafel wrap. It's very filling! The potatoes are roasted and seasoned wonderfully and they can put a creamy garlic sauce over them. The salad is spiced with vinegar and roasted thyme. YUM! The falafel is perfect! Crunchy outside, soft inside, seasoned, it's great. The rice is the only thing I'm not excited about. It doesn't have as strong a flavor as I would like but if you like a blander taste you will probably enjoy it. Overall, very affordable and at a great cost. I could not recommend this place more."
print(f"{model.predict(GetSingleSequence(review))[0][0]} - Awesome like Al-Madina!!! ")
0.9842723608016968 - Awesome like Al-Madina!!! 

Question 8.1.2: How does the model perform on negative review ?

Review: Extremely poor service. They have no servers. They take an order and don't even bring it. So you are left with no food on the table.
We were there today and ordered for bhindi masala and hariyali kofta and Naan. The haryali kofta came in with naan, We had to remind the server twice to bring in the rice.Then we were waiting for the bhindi masala to be brought to the table. In the mean while the other orders were being brought out and no sign of our second entree. We kept reminding them about the order and all we got was yes it is getting ready. By the third reminder I was so upset and told him he should cancel the bhindi masala order as we were almost done eating. The nerve he had to tell us "oh we are just bringing it out". I was really upset about it, we had waited for almost 30 mins for the bhindi masala. We just finished our one entree and left, there was no sorry nothing from the person taking the bills. That's pretty rude. Don't know how long this restaurant can function with this kind of service. Would like to put Zero stars for this like of service.

Rating: 1

Source - Read Avisha G.'s review of Ravis Hyderabad House on Yelp

In [0]:
review = "Extremely poor service. They have no servers. They take an order and don't even bring it. So you are left with no food on the table. We were there today and ordered for bhindi masala and hariyali kofta and Naan. The haryali kofta came in with naan, We had to remind the server twice to bring in the rice.Then we were waiting for the bhindi masala to be brought to the table. In the mean while the other orders were being brought out and no sign of our second entree. We kept reminding them about the order and all we got was yes it is getting ready. By the third reminder I was so upset and told him he should cancel the bhindi masala order as we were almost done eating. The nerve he had to tell us oh we are just bringing it out. I was really upset about it, we had waited for almost 30 mins for the bhindi masala. We just finished our one entree and left, there was no sorry nothing from the person taking the bills. That's pretty rude. Don't know how long this restaurant can function with this kind of service. Would like to put Zero stars for this like of service."
print(f"{model.predict(GetSingleSequence(review))[0][0]} - Decent ! ")
0.24379348754882812 - Decent ! 

8.2 Movie / Series Reviews

Question 8.2.1: How does the model perform on positive review ?

Review: A glorious train wreck that will not let you look away! You should be inside already, sit back and enjoy the show!

Rating: 5

Source - Review by Tom A at Rotten Tomatoes

In [0]:
review = "A glorious train wreck that will not let you look away! You should be inside already, sit back and enjoy the show!"
print(f"{model.predict(GetSingleSequence(review))[0][0]} - 5 stars for the prediction. Roar!!! ")
0.8505125641822815 - 5 stars for the prediction. Roar!!! 

Question 8.2.2: How does the model perform on negative review ?

Review: Season 8 of Game of Thrones had some of the laziest writing I have ever seen. The writers abandoned every storyline, character storylines/arcs, and reverse engineered everything to get their mad queen narrative. Season 8 quite simply, was an insult to viewers and GOT fans everywhere.

Rating: 0.5

Source - Review by Jess C at Rotten Tomatoes

In [0]:
review = "Season 8 of Game of Thrones had some of the laziest writing I have ever seen. The writers abandoned every storyline, character storylines/arcs, and reverse engineered everything to get their mad queen narrative. Season 8 quite simply, was an insult to viewers and GOT fans everywhere."
print(f"{model.predict(GetSingleSequence(review))[0][0]} - Avearge, Just like GOT Season 8 ! ")
0.3890485167503357 - Avearge, Just like GOT Season 8 ! 

8.3 Tweets

Question 8.3.1: How does the model perform on positive tweet ?

Tweet:

We hope you have a great weekend.

We hope you stay at home.

We hope you stay away from gatherings and practice physical distancing.

We hope you decide to protect others, protect our community.

We hope you take #COVID19 seriously.

InThisTogether

Source - Twitter

In [0]:
review = "We hope you have a great weekend. We hope you stay at home. We hope you stay away from gatherings and practice physical distancing. We hope you decide to protect others, protect our community. We hope you take #COVID19 seriously. #InThisTogether"
print(f"{model.predict(GetSingleSequence(review))[0][0]} - Positive!! ")
0.72294020652771 - Positive!! 

Question 8.3.2: How does the model perform on negative tweet ?

Tweet:

@airtelindia

@Airtel_Presence Call back from network team is complete lie. They give a single ring missed call and say they could reach us.

Network provider cannot reach customer on their own network lol. Irony

Source - Twitter

In [0]:
review = "Call back from network team is complete lie. They give a single ring missed call and say they could reach us. Network provider cannot reach customer on their own network lol. Irony"
print(f"{model.predict(GetSingleSequence(review))[0][0]} - Spot On !!! ")
0.04222555086016655 - Spot On !!! 

8.4 Sports Reactions/Comments

Question 8.4.1: How does the model perform on positive comment ?

Comment:

He’s hugely popular in the squad and his outlook and fun-loving nature generates a sense of collective well-being and togetherness. I know social media isn’t a true barometer of anything, but look at the way he interacts with other players on Instagram and how they respond. It’s not just his so-called mates either, it’s senior players and young players, right throughout the squad.

Source: Arseblog

In [0]:
review = "He’s hugely popular in the squad and his outlook and fun-loving nature generates a sense of collective well-being and togetherness. I know social media isn’t a true barometer of anything, but look at the way he interacts with other players on Instagram and how they respond. It’s not just his so-called mates either, it’s senior players and young players, right throughout the squad."
print(f"{model.predict(GetSingleSequence(review))[0][0]} - Decent! ")
0.6613869667053223 - Decent! 

Question 8.4.2: How does the model perform on negative comment ?

Comment:

The issues were endless. Fractious relationships with players; Ozil and Ramsey left out then brought back; a lack of genuine authority despite trying to be an authoritarian when he first took over; no defined style of play; our captain basically destroying his own legacy to get away from the club as quickly as possible this summer (more alarm bells); the indecision over simple things like who should be captain; the Xhaka situation for which the player deserves criticism for his reaction, but under which Emery had lit a fuse that never needed to be lit with his handling of the captaincy; poor communication, and clumsy attempts to connect with fans who had long lost faith; there was just so much in 18 months that it had to come to head.

Source: Arseblog

In [0]:
review = "The issues were endless. Fractious relationships with players; Ozil and Ramsey left out then brought back; a lack of genuine authority despite trying to be an authoritarian when he first took over; no defined style of play; our captain basically destroying his own legacy to get away from the club as quickly as possible this summer (more alarm bells); the indecision over simple things like who should be captain; the Xhaka situation for which the player deserves criticism for his reaction, but under which Emery had lit a fuse that never needed to be lit with his handling of the captaincy; poor communication, and clumsy attempts to connect with fans who had long lost faith; there was just so much in 18 months that it had to come to head."
print(f"{model.predict(GetSingleSequence(review))[0][0]} - Great !! ")
0.1975400149822235 - Great !! 

8.5 Are there any examples where the model is not performing well ?

8.5.1: Criticising the critics

Review:

This movie should not be rated by any "critic" this show has a following that spans millions and millions of fans. As a true fan of this show and these guys, I rate it a popping 5 Stars.. The movie was excellent all around. It's just sad to see some guy who probably sat and watched this movie with Oscar Award goggles on. Hahahahahahahahaha... now that was funny.

My Label: Positive Review with criticism for critics

Rating: 5

Model Performance: Although objectively overall sentiment of the statement is mixed but with respect to movie the sentiment is clearly positive, the model is not taking into account the context specific positivity and instead has a generalized approach to sentiments of the statement.

Source

In [0]:
review = "This movie should not be rated by any critic this show has a following that spans millions and millions of fans. As a true fan of this show and these guys, I rate it a popping 5 Stars.. The movie was excellent all around. It's just sad to see some guy who probably sat and watched this movie with Oscar Award goggles on. Hahahahahahahahaha... now that was funny."
print(f"{model.predict(GetSingleSequence(review))[0][0]} - Confused !! ")
0.3304857313632965 - Confused !! 

8.5.2: Sarcasm

Review:

God Himself could not record as good a greatest hits album like this, and if He were to listen to all 17 tracks on this compilation, He would refrain from striking me down for blasphemy. The song “Hot Shot City” is particularly good.

My Label: Negative

Model Performance: It would be too much to expect the model to understand the sarcasm, especially when text are not accompanied by pauses and tonal modulations. May be in a parallel universe where everyone reviews like Chandler Bing, this model would still likely give same results ;).

Source

In [0]:
review = "God Himself could not record as good a greatest hits album like this, and if He were to listen to all 17 tracks on this compilation, He would refrain from striking me down for blasphemy. The song “Hot Shot City” is particularly good."
print(f"{model.predict(GetSingleSequence(review))[0][0]} - Bazinga !!! ")
0.9112076759338379 - Bazinga !!! 

9. Results

9.1 Summary

From the analysis, it is quite evident that the model developed is well-equipped to appropriately classify the reviews based on its sentiments.

Initially, I started with setting up the dataset in my google drive, as colab doesn't hold datasets in the environment for too long and each time the runtime gets disconnected (which is quite often) I would need to upload the dataset. I then performed EDA on it if there's something that is statistically deceptive in the dataset.

I then cleaned data of any punctuations and html tags. Converted the data into a binary classification problem, where reviews with rating (or score) 1,2 & 3 were labeled as negative sentiments (or 0) and reviews with rating 4 & 5 were labeled as positive sentiments (or 1).

I then fed the data to the Word2Vec method from the Gensim library. Gensim word2vec model then returned a word vector object containing contextualized embedding matrix, vocabs, index2words..etc. I then use the vocab and set up the training data and labels into a network feedable format.

In the next section, I iteratively developed a Recurrent Neural Network model using the super-easy Keras wrapper of the TensorFlow framework. I also assigned appropriate weights to each class to account for the massive class imbalance. After finalizing the model I then analyzed the performance. Overall the model seemed to be doing pretty well (i.e 90% train and validation accuracy), but after careful analysis, it was evident that the model does seem to be doing well on all types of reviews with exception of reviews with rating 4.

Reviews with rating 4 were found to have disproportionately high number of reviews to be incorrectly predicted as negative, but a detailed analysis on samples from different probability distribution revealed that although most of the reviews had rating 4, the overwhelming sentiments in the reviews were either mixed or negative, hence the incorrect prediction in terms of rating. But with respect to sentiments of these reviews, the incorrect prediction actually made more sense.

An explanation why reviews with rating 4 either had negative or mixed sentiments can be that, generally, when people review something they've 2 approaches the bottom-up approach (i.e 0 to 5), where the reviewer starts from zero and keeps adding points incrementally, and the opposite top-down approach (i.e 5 to 0), where one assigns full points to a product and then decreases the points with each failure in the sequential assessment of the product. The fact that the network could learn the second type of reviews without being explictly told (i.e no labels were provided of such nature) shows the immense power of LSTM (Long Short Term Memory) cells and RNN architecture, and it also reminded me of a popular blog post by Andrei Karpathy about The Unreasonable Effectiveness of Recurrent Neural Networks.

Further, I validated the model with previously unseen data from the web, and the model performed as expected. I then tried to predict sentiments on statements or reviews from other domains by pooling in data from different sources, to see if the weights learned can be used for the other tasks as well, and the results were promising.

Although I cannot claim that the learned weights alone can suffice to predict the sentiments from different domains, but certainly one can transfer the learning from one task to another task, given that tasks are of similar nature, and on top of the learned weights one can have a small network of its own which is adept at classifying sentiments for that domain, thereby greatly reducing the time taken for training.

9.2 Key Results

  1. Model finalized was a RNN model (LSTM Cells).
  2. Model had an initial train and validation accuracy of 90%.
  3. Model had Sensitivity of 91.44%, Specificity of 93.56% and precision equal to 98.06%.
  4. Upon detailed analysis of the errors, model accuracy and learning was found to be higher than the earlier assessment.
  5. Model is close to the desired Low Bias Low Variance sub-section.
  6. Learned weights can be successfully used to learn sentiments for similar tasks in different problem areas through transfer learning.
  7. RNN Architecture (including the LSTM cell) is extremely powerful at dealing with sequence learning problems, especially more so when it comes to Natural Language Processing.

9.3 Application

There can be many ways to apply this model, but one application that right away comes to mind is, maybe a company which sells food products can set up web crawlers, which would go out and crawl information about its product from various social media platforms (like Twitter, Facebook, Linkedin..etc), the model then automatically classifies the sentiment of that statement and this can be broadly used to check overall how the product is being received on a daily, weekly or monthly basis, and based on the results, further analysis can be commissioned for various sub-groups.

9.4 Future Works

As far as the classifier is concerned the scope for improvement is less, any significant improvement would require a lengthy process of iteration. However, it would be very useful if I can deploy this model, have it predict the sentiments on various newly generated data and get a sampled and customized report on a daily basis to check if the model is indeed performing as hypothesized by the analysis.

10. Acknowledgements

  1. Some of the analysis was completed using Kaggle Kernels.
  2. This analysis was later transferred to Google Colab due to environment issues
  3. J. McAuley and J. Leskovec. From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. WWW, 2013.